Algorithms for Finding Multivariate Discriminant Rules for Classification and Regression Trees
نویسنده
چکیده
Progress in technologies for data input, such as POS (Point Of Sales) systems, and technologies for data storage, such as high density magnetic or optical recording devices, have made it easier for enterprises to collect massive amounts of data and to store them on hard disk at a very low cost. From the early 90’s, many enterprises have been interested in extracting previously unnoticed information that inspires new marketing strategies from these huge databases. Technologies for extracting such information, or knowledge, from huge databases are called “data mining.” Data mining covers technologies for association analysis, classification and regression, cluster analysis, and evolution analysis. Most of these have been widely studied in the field of databases, statistics, and machine learning. Data mining, in general, is focusing on efficiency so that we can handle emerging huge databases whose size is too large to be processed by the conventional techniques. Among these technologies, the author focused on the association analysis and the classification and regression in this dissertation. The author considered association rules on numerical attributes while conventional data mining can only effective for categorical attributes. The accomplishment significantly expanded applications of the association analysis. Then, the author explored classification and regression problems. By utilizing techniques developed for the numerical association rules, the author proposed accurate and comprehensive classification and regression trees. Among these accomplishments, primary contributions of the author are the works on the classification and regression problems. In general, huge databases often contain many attributes and there are many correlations among attributes. However, conventional data mining techniques cannot handle correlations well. In the statistics literature, multivariate analysis methods have been used to handle correlations in numerical databases. Many statistical methods, such as “principal component analysis,” “factor analysis,” and so forth, are categorized as multivariate analysis. Most of the methods in the multivariate analysis assume a linear correlation. Such conventional techniques are effective for data that have linear correlations. However, data contain various types of correlations that cannot be handled by the conventional methods. In order to handle various types of correlations, the author proposed multivariate optimized discriminant rules that can be defined on more than one attribute and presented efficient algorithms for finding the rules. The algorithms efficiently find multivariate discriminant rules
منابع مشابه
Predicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملکاربرد الگوریتمهای دادهکاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد
Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...
متن کاملComparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images
Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...
متن کاملارائه مدلی برای پیشبینی نوع صافی همودیالیز با تکنیکهای دادهکاوی
Introduction: Inadequate dialysis for patients' kidneys as a mortality risk necessitates the presence of a pattern to assist staff in dialysate part to provide the proper services for dialysis patients and also the proper management of their treatment. Since the role of buffer type in the adequacy of dialysis is determinative, the present study is aimed at determining hemodialysis buffer type. ...
متن کاملLinear and Nonlinear Multivariate Classification of Iranian Bottled Mineral Waters According to Their Elemental Content Determined by ICP-OES
The combinations of inductively coupled plasma-optical emission spectrometry (ICP-OES) and three classification algorithms, i.e., partial least squares discriminant analysis (PLS-DA), least squares support vector machine (LS-SVM) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of Iranian bottled mineral waters, were explored. ICP-OES was used for th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002